Search CORE

10 research outputs found

Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

Author: D Trieschnigg
E Gibaja
E Loza Mencía
F Pedregosa
F Sebastiani
JH Friedman
LN Rolling
M Huang
N Tahmasebi
O Medelyan
P Geurts
Publication venue
Publication date: 01/01/2018
Field of study

Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.Comment: authors' manuscript, paper submitted to TPDL-2018 conference, 12 page

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

Author: D Malerba
E Loza Mencía
F Charte
F Thabtah
G Bosc
G Tsoumakas
J Demšar
JL Ávila-Jiménez
K Dembczyński
M Allamanis
U Kayande
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2018
Field of study

Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads.Comment: Preprint version. To appear in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018. See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3074 for further information. arXiv admin note: text overlap with arXiv:1812.0005

arXiv.org e-Print Archive

Crossref

Learning Interpretable Rules for Multi-label Classification

Author: A Gabriel
AA Freitas
AJ Knobbe
B Liu
B Minnaert
D Malerba
E Gibaja
E Gibaja
E Loza Mencía
E Montañés
F Charte
F Herrera
F Janssen
F Thabtah
G Bosc
G Tsoumakas
Grigorios Tsoumakas
H Allahyari
J Arunadevi
J Demšar
J Fürnkranz
J Han
J Hipp
J Read
JN Sulzmann
K Dembczyński
K Dembczyński
L Chekina
L Raedt De
LE Sucar
M Atzmüller
M Beckerle
M Friedman
M Zhang
Miltiadis Allamanis
MR Boutell
P Kralj Novak
PJ Hayes
R Senge
RM Cameron-Jones
Shantanu Godbole
W Duivesteijn
W Waegeman
WW Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio

arXiv.org e-Print Archive

TUbiblio

Crossref

Multi-Target Prediction: A Unifying View on Problems and Methods

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Multilabel classification via calibrated label ranking

Author: A. Elisseeff
B.-L. Lu
C. W. Coakley
C.-W. Hsu
D. D. Lewis
D. Price
E. Loza Mencía
Eneldo Loza Mencía
Eyke Hüllermeier
G. Salton
G. Tsoumakas
I. Tsochantaridis
J. Demšar
J. Fürnkranz
J. Fürnkranz
J. Fürnkranz
J. Putter
J. Rousu
Johannes Fürnkranz
K. Crammer
Klaus Brinker
M. R. Boutell
N. Weskamp
O. Dekel
Q. McNemar
R. A. Bradley
R. E. Schapire
R. E. Schapire
S. Har-Peled
S. Knerr
S. Knerr
S. Shalev-Shwartz
S.-H. Park
T. Gärtner
T. Hastie
U. H.-G. Kreßel
Y. Altun
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Statistical topic models for multi-label document classification

Author: A. C. P. L. F. Carvalho de
A. K. McCallum
America Chambers
D. Blei
D. D. Lewis
D. M. Blei
D. M. Blei
D. M. Blei
D. Mimno
D. Mimno
D. Ramage
E. L. Allwein
E. Loza Mencía
E. Loza Mencía
F. Sebastiani
G. Druck
G. Forman
G. Tsoumakas
G. Tsoumakas
J. Davis
J. Fürnkranz
J. Read
J. Zhu
K. Crammer
K.-M. Schneider
L. Cao
M. Ioannou
M. Rosen-Zvi
M.-L. Zhang
M.-L. Zhang
Mark Steyvers
N. Ghamrawi
N. Japkowicz
N. Ueda
O. Dekel
Padhraic Smyth
R. Rak
R. Rifkin
R.-E. Fan
S. Ji
S. Lacoste-Julien
T. L. Griffiths
T.-Y. Liu
Timothy N. Rubin
W. Hersh
Y. W. Teh
Y. Wang
Y. Yang
Y. Yang
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Dual Layer Voting Method for Efficient Multi-label Classification

Author: E. Loza Mencía
J. Fürnkranz
J. Fürnkranz
J. Platt
J.R. Quinlan
P. Duygulu
R.E. Schapire
T.F. Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Scalable Text Classification with Sparse Generative Modeling

Author: A. Singhal
D.D. Lewis
E. Loza Mencía
H. Wang
J. Read
K.S. Jones
M.E. Maron
M.J.D. Powell
M.R. Boutell
V.R. Shanks
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Abstract. Machine learning technology faces challenges in handling “Big Data”: vast volumes of online data such as web pages, news sto-ries and articles. A dominant solution has been parallelization, but this does not make the tasks less challenging. An alternative solution is using sparse computation methods to fundamentally change the complexity of the processing tasks themselves. This can be done by using both the spar-sity found in natural data and sparsified models. In this paper we show that sparse representations can be used to reduce the time complexity of generative classifiers to build fundamentally more scalable classifiers. We reduce the time complexity of Multinomial Naive Bayes classification with sparsity and show how to extend these findings into three multi-label extensions: Binary Relevance, Label Powerset and Multi-label Mixture Models. To provide competitive performance we provide the methods with smoothing and pruning modifications and optimize model meta-parameters using direct search optimization. We report on classification experiments on 5 publicly available datasets for large-scale multi-label classification. All three methods scale easily to the largest available tasks, with training times measured in seconds and classification times in mil-liseconds, even with millions of training documents, features and classes. The presented sparse modeling techniques should be applicable to many other classifiers, providing the same types of fundamental complexity reductions when applied to large scale tasks

CiteSeerX

Crossref

An adapted incremental graded multi-label classification model for recommendation systems

Author: AP Dempster
C Liu
CC Aggarwal
E Cohen
E Gibaja
E Gibaja
E Loza Mencía
FM Harper
G Lastra
HL Nguyen
I Žliobaitė
J Bobadilla
J Quinlan
J Read
J Read
Ja Gama
JR Quinlan
JS Cardoso
K Goldberg
L Breiman
L Zadeh
MD Ekstrand
MJ Pazzani
MM Gaber
P Clark
PE Utgoff
PE Utgoff
Q Wu
R Pan
S Kim
W Hoeffding
W Qu
X Qiao
X Wang
X Yang
Z Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Multi-target prediction: a unifying view on problems and methods

Author: A Ben-Hur
A Caponnetto
A Izenman
A Menon
A Merwe Van der
B Bakker
C Bielza
C Papagiannopoulou
C Vens
CF Loan Van
CN Silla
D Zhang
DH Wolpert
E Candes
E Hüllermeier
E Loza Mencía
E Spyromitros-Xioufis
Eyke Hüllermeier
F Dinuzzo
F Tai
G Obozinski
G Peer Van
G Tsoumakas
H Liu
H Rangwala
J Abernethy
J Read
J Rousu
JP Vert
K Dembczyński
Krzysztof Dembczyński
L Baldassarre
L Breiman
L Jacob
M Gönen
M Stock
M Álvarez
N Spolaôr
R Caruana
R Gaujoux
R Mazumder
R Pelossof
RK Ando
S Godbole
T Aho
T Evgeniou
T Hastie
T Pahikkala
T Pahikkala
W Bi
W Cheng
W Liu
W Waegeman
W Waegeman
Willem Waegeman
X Kong
Y Fu
Y Park
Y Wei
Y Xue
Z Akata
Z Barutcuoglu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref